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1.  Statement  of  the  Problem  Studied 

This  research  is  focussed  on  methods  for  the  numerical  solution  of  nonlinear  optimization  prob¬ 
lems.  Optimization  problems  constitute  an  important  class  of  mathematical  problems  whose  computer 
solution  is  desired  by  scientists  and  engineers.  The  need  to  solve  such  problems  arises  in  a  huge  variety 
of  computer  simulations  and  data  analyses,  with  examples  including  optimal  design  of  aircraft  and  space¬ 
craft  for  low  cost  or  high  efficiency,  optimal  control  of  chemical  processes,  and  calculation  of  the  native 
states  of  biological  molecules.  In  recent  years,  as  computer  power  has  increased,  the  feasibility  of  investi¬ 
gating  optimal  designs  has  increased,  and  along  with  this  the  need  for  improved  optimization  methods  has 
grown.  This  requires  fundamental  research  in  optimization  algorithms,  especially  for  classes  of  large- 
scale  problems.  In  addition,  much  of  this  computation  must  utilize  the  fastest  available  computers,  which 
increasingly  are  parallel  computers.  Thus  the  development  of  optimization  methods  that  are  well-suited 
to  parallel  computers  is  also  a  pressing  need.  The  research  funded  by  the  grant  was  in  3  areas:  develop¬ 
ment  of  global  optimization  methods  for  prediction  of  protein  structure  by  potential  energy  minimization, 
the  development  of  interior  point  methods  for  large-scale  constrained  optimization,  and  the  development 
of  tensor  methods  for  large-scale  systems  of  nonlinear  equations. 

One  of  the  main  emphases  of  this  research  program  in  recent  years  has  been  the  development  of 
global  optimization  methods  for  the  solution  of  large  molecular  configuration  problems.  Global  optimiza¬ 
tion  means  finding  the  lowest  minimizer  of  a  nonlinear  function  that  may  have  numerous  local  minimiz- 
ers.  The  objective  of  this  research  is  to  develop  methods  that  are  reliably  capable  of  solving  difficult, 
large  nonlinear  global  optimization  problems.  Our  current  work  is  investigating  methods  to  find  the 
native  configurations  of  proteins  and  polymers.  This  is  a  problem  of  great  importance  in  science;  it 
includes  the  well-known  protein  folding  problem  as  well  as  the  investigation  of  polymers  that  are  used  to 
make  new  materials.  The  resultant  optimization  problems  are  very  difficult  because  they  have  extremely 
large  numbers  of  local  minimizers  that  have  similar  function  values. 

Another  main  focus  of  our  research  is  the  development  of  new  algorithms  for  large-scale  uncon¬ 
strained  and  constrained  optimization  problems,  including  limited-memory  methods  for  problems  with 
many  thousands  of  variables,  and  interior  point  methods  for  nonlinearly  constrained  problems.  We  are 
also  investigating  theoretical  convergence  issues  for  optimization  algorithms  that  have  practical  conse¬ 
quences  in  how  problems  are  formulated,  and  new  approaches  to  one  of  the  key  numerical  linear  algebra 
subproblems  that  is  particular  to  optimization  algorithms. 
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2.  Summary  of  the  Most  Important  Results 

Our  main  activity  in  this  grant  has  been  the  development  and  testing  of  techniques  for  solving 
global  optimization  problems  for  determining  the  structure  of  proteins  and  polymers.  The  problem  is  to 
find  the  lowest  energy  configuration  of  a  protein  or  other  polymer.  This  problem  is  a  global  optimization 
problem  because  it  has  a  huge  number  of  local  minimizers.  In  addition,  locating  the  lowest  (global)  rnini- 
mizer  is  very  difficult.  For  proteins,  the  solution  of  this  problem  would  represent  a  solution  to  the  well- 
known  protein-folding  problem. 

In  previous  research  periods,  we  have  developed  a  stochastic/perturbation  approach  for  solving 
global  optimization  problems  from  molecular  chemistry.  There  are  four  keys  to  this  approach.  The  first  is 
a  large  scale  global  optimization  methodology  that  performs  small-scale  global  optimizations  with  only  a 
small  number  of  parameters  variable  and  the  remaining  parameters  temporarily  fixed,  followed  by  local 
minimizations  with  all  parameters  varying,  at  each  stage  of  the  global  optimization  procedure.  The  sec¬ 
ond  is  the  incorporation  of  a  new,  efficient  approach  towards  smoothing  the  objective  function  in  the 
global  optimization  framework.  Initially  these  have  been  the  backbone  of  the  approach.  More  recently 
two  other  aspects  have  become  key  to  doing  work  on  realistic  protein  targets.  One  of  these  is  the  incorpo¬ 
ration  of  predictions  from  secondary  structure  prediction  methods  in  the  initial  phase  of  our  algorithm,  to 
produce  starting  configurations  with  reasonably  good  secondary  structure.  The  final  one  is  work  with  our 
chemistry  partners  to  use  the  mismatch  between  simulation  and  experimental  evidence  to  continue  to 
refine  the  mathematical  energy  model  upon  which  our  global  optimization  approach  relies. 

Originally,  our  research  had  been  applied  primarily  to  molecular  clusters,  and  to  small,  alpha-heli¬ 
cal  proteins.  We  had  developed  good  biasing  methods  for  creating  predicted  alpha-helical  secondary 
structure  at  the  start  of  our  method,  and,  in  the  last  research  period,  had  shown  that  our  global  optimiza¬ 
tion  approach  could  do  a  reasonably  good  job  of  predicting  the  full  tertiary  structure  of  several  helical 
proteins  of  about  70  amino  acids.  We  had  also  demonstrated  the  effectiveness  of  our  smoothing  approach. 

In  the  course  of  this  grant,  we  evolved  our  approach  to  be  able  to  handle  proteins  with  arbitrary 
structure.  The  crux  of  this  issue,  for  us  and  other  groups  doing  related  research,  is  the  ability  to  handle 
beta-sheets.  Beta-sheets  are  the  other  main  type  of  secondary  structure  in  proteins.  Flowever  they  are  far 
less  local  than  alpha-helices.  While  alpha-helices  are  continuous,  beta-sheets  are  formed  by  contiguous 
strands  that  can  be  arbitrarily  far  apart.  Secondary  structure  prediction  programs  can  predict  the  strands 
with  good  accuracy,  but  they  do  not  predict  which  strands  are  bonded  together,  nor  the  parallel  or  anti-par¬ 
allel  orientation  of  those  bonds. 

One  main  component  of  our  research  was  the  development  of  a  biasing  function  that,  given  predic¬ 
tions  of  which  amino-acids  are  bonded  together  to  form  the  beta-sheet,  influences  the  protein  to  form 
these  bonds.  Biasing  functions  are  simply  penalty  functions  from  optimization  that  are  added  to  the 
energy  function.  We  constructed  a  function  that  is  a  combination  of  a  sigmoid  at  low  distances  and  a  lin¬ 
ear  function  at  higher  distances,  to  balance  the  need  to  form  the  bonds  but  not  to  overly  bias.  Along  with 
this,  we  built  upon  existing  software  from  the  bio-chemistry  community  to  construct  techniques  to  predict 
the  several  most  likely  combinations  of  the  predicted  beta  strands  into  beta  sheets.  The  tests  we  describe 
later  in  this  section,  as  well  as  earlier,  simpler  tests,  clearly  demonstrated  the  viability  of  these  new 
approaches. 

Simultaneously  with  the  beta  sheet  research,  we  continued  to  refine  the  heart  of  the  global  opti¬ 
mization  algorithm.  As  the  protein  sizes  increase,  the  selection  of  roughly  5-6  dihedral  angles  (out  of 
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200-500)  at  each  stage  to  be  the  parameters  in  the  small  scale  global  optimization  becomes  even  more 
crucial.  And.  not  only  should  these  be  the  parameters  whose  variation  can  lead  to  improvements  in  the 
tertiary  structure,  but  they  also  must  be  a  set  that  can  be  varied  without  destroying  good  secondary  struc¬ 
ture.  We  have  developed  new  approaches  to  selecting  these  parameters  that  emphasize  selecting  from 
portions  of  the  protein  that  are  not  parts  of  the  regions  of  secondary  structure.  We  also  have  begun  to 
develop  ways  to  selecting  a  set  of  angles  from  turns  connecting  beta  sheets  so  as  not  to  destroy  the  beta 
sheet. 


Equally  important  is  the  ability  to  determine  structural  different  proteins  to  work  upon.  This  issue 
generalizes  to  any  global  optimization  problem  where  we  want  to  spread  our  effort  over  the  search  space. 
During  this  grant  we  formulated  and  implemented  a  clustering  technique  that  takes  all  the  currently  active 
configurations  and  groups  them  into  clusters  of  similarly  shaped  configurations.  Then  our  algorithm  only 
proceeds  with  one  configuration  from  any  given  cluster  at  once.  The  clustering  also  is  used  to  determine 
when  to  stop  our  algorithms,  based  upon  whether  the  number  of  clusters  still  is  growing,  or  not. 

An  important  milestone  in  our  research  was  our  participation  in  the  fourth  Critical  Assessment  of 
Techniques  for  Protein  Structure  Prediction  (CASP4)  competition  in  summer  2000.  This  competition, 
held  every  two  years  in  the  bio-chemistry  community,  invites  any  interested  groups  to  blindly  predict  the 
structure  of  proteins  that  are  about  to  be  experimentally  analyzed.  About  170  groups  entered  this  year. 
These  groups  utilize  many  different  approaches,  most  based  upon  comparison  to  the  structures  of  known 
proteins.  Our  approach,  which  only  uses  secondary  structure  prediction  but  not  sequence  matching,  is  at 
the  pure  end  of  the  spectrum  and  is  particularly  important  for  predicting  "new  folds"  that  do  not  closely 
match  known  proteins.  We  spent  the  summer  predicting  eight  proteins,  with  sizes  ranging  from  56  to  242 
amino  acids.  Several  of  these  were  helical  but  the  majority  were  mixed  alpha-beta  proteins.  Overall  our 
group  did  quite  well  in  this  competition.  Our  predictions  were  in  the  top  quart ilc  of  all  groups  for  the  3 
proteins  that  were  in  the  top  15%  of  difficulty,  and  the  best  result  of  all  groups  for  the  hardest  protein  we 
attempted  with  242  amino-acids.) 

Based  in  part  on  this  the  results  of  this  competition  we  have  been  able  to  make  several  improvements  in 
the  algorithm,  particular  in  the  area  of  predicting  beta  sheets. 

In  addition,  we  have  continued  our  collaborations  with  Jorge  Nocedal  at  Northwestern  University  on  the 
development  of  a  robust  algorithm  for  nonlinearly  constrained  optimization.  During  this  research  period 
we  have  continued  to  develop  a  software  package  for  this  problem,  KNITRO,  and  conducted  extensive 
comparative  tests  of  it  and  several  other  leading  packages.  The  tests  show  that  no  one  method  is  prefer¬ 
able  in  all  situations,  but  that  KNITRO  is  a  very  competitive  method.  We  have  added  several  enhance¬ 
ments  to  KNITRO  including  a  strictly  feasible  version,  and  a  quasi-Newton  version. 


During  this  research  period  we  also  began  work  on  new  tensor  methods  for  very  large  scale  of  non¬ 
linear  equations.  We  have  developed  a  new  approach  for  iteratively  solving  the  tensor  model  that  avoids 
the  cost  of  a  second  backsolve  that  the  previous  approach  had.  We  have  also  developed  a  new  curvilinear 
line  search  for  tensor  methods  that  eliminates  the  need  to  use  the  tensor  and  Newton  direction  separately 
and  which  produces  monotonic  descent  on  the  tensor  model. 
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